Being able to predict behavioral outcomes from neural activity is a cruical part of neuroscience research as it has many applications in understanding how decisions are made. This report analyzes how spike train data from the visual corext of mice who are performing a decision-making task, where trials can result in a sucess (feedback = 1) or failure (feedback = -1). Using data from Steinmetz et al. (2019), 18 sessions with 4 mice were picked. We investigated the effectiveness of three classification models - Logistic Regression, Linear Discriminat Analysis (LDA), and k-Nearest Neighbors (kNN) - to predict the trials outcome based on neural activity and the stimulus’ contrast levels.
Feature analysis revealed that mean firing rate was the most significant predictor across models and contract levels having mixed effects - but important nonetheless. Principal Component Analysis (PCA) suggests there are subtle differences in neural firing across session and mice but no strong clustering. Furthermore, the results of the report indicated that kNN achieved the highest accuracy (73.5%), outperforming the other two classificatio model. However, all models struggle with class imbalance due to the data having more success than failures.
Although the models did achieve a reasonable accuracy rate, there were limitations in handling the imblanace of data and complext neural activity patterns. Further analysis would work on advanced models that can be able to handle more complex algorithms - rather than just linear which was used in this report. Addressing these challenges would enhance the report and better the understanding of decision-making.
Neural activity and behavioral outcomes is a critical part of neurosicence and has significant implications for research and clinical application.
In this project, we will focus on analyzing data collected in Steinmetz et al. (2019) in order to provide an analysis on decision-making tasts in mice. This analysis will focus on 18 sessions obtained from four mice (Cori, Forssman, Hence, and Lederberg). With each session consisting of hunders of traisl in which the visual stimuli had variying contrast levels are presented to them on a screen. The mice were be required to make a choice (left or right) and their decisions were classified as successes (feedback or 1) or failures (feedback of -1).
The primary objective of this report is to develop a prdictive model that can accurately determine the outcome of each trial based on the neural activity data-represented by spike trains recorded in the visual corext- and the stimuli parameters (left or right contrasts). To achieve this, the project will employ the logistic regression and k-nearest neighbors (kNN) method. Both models will be compared which will help determine which strategy for predictive behavioral outcome the best.
The project will answer two questions: which predictors are the most important to predicting the outcome of each trail and which binary classification method most accurately classifies which one.
In order to create a model, we must first understand the data we will be handling. The data provided has 18 sessions and 4 mice, and 6 variables that were measured: feedback_type, contrast_left, contrast_right, time, spks, brain_area.
Now that we are familiar with the semantics of the dataset, we will visualize it to better understand the trends already present in the data. In the table presented, we can see that each session is for a different mice and the number or traisl vary between 100 to 450 trials. It can also be seen that there is a lot of different types of neurons that were used and different numbers of neurons depending on the trial.
It is also important to know how the mice performed on the trials. It can be seen that the mice were successful 3608 times compared to their 1474 failures. This means that the mice were successful most of the time. Next, the cross-tabulation of left and right contrast levels gives further details about the stimulus conditions across sessions. It is important to note that some conditions occurred more than others, especially when there was no contrast in the left and right.
Additionally, the plot shows which parts of the brain were measured in each of the session. Some areas were measured more than others which is important to note as not all areas can be attributed to the data. This summary helps shed light on the data set that will be used for later modeling and analysis.
## session_number mouse_name date_exp n_trials n_neurons
## 1 1 Cori 2016-12-14 114 734
## 2 2 Cori 2016-12-17 251 1070
## 3 3 Cori 2016-12-18 228 619
## 4 4 Forssmann 2017-11-01 249 1769
## 5 5 Forssmann 2017-11-02 254 1077
## 6 6 Forssmann 2017-11-04 290 1169
## 7 7 Forssmann 2017-11-05 252 584
## 8 8 Hench 2017-06-15 250 1157
## 9 9 Hench 2017-06-16 372 788
## 10 10 Hench 2017-06-17 447 1172
## 11 11 Hench 2017-06-18 342 857
## 12 12 Lederberg 2017-12-05 340 698
## 13 13 Lederberg 2017-12-06 300 983
## 14 14 Lederberg 2017-12-07 268 756
## 15 15 Lederberg 2017-12-08 404 743
## 16 16 Lederberg 2017-12-09 280 474
## 17 17 Lederberg 2017-12-10 224 565
## 18 18 Lederberg 2017-12-11 216 1090
## neuron_types
## 1 ACA, MOs, LS, root, VISp, CA3, SUB, DG
## 2 CA1, VISl, root, VISpm, POST
## 3 DG, VISam, MG, CA1, SPF, root, LP, MRN, POST, NB, VISp
## 4 LGd, DG, TH, SUB, VPL, VISp, CA1, VISa, LSr, ACA, MOs
## 5 VISa, root, CA1, SUB, DG, OLF, ORB, ACA, PL, MOs
## 6 AUD, root, SSp, CA1, TH
## 7 VPL, root, CA3, LD, CP, EPd, SSp, PIR
## 8 ILA, TT, MOs, PL, LSr, root, LD, PO, CA3, VISa, CA1, LP, DG, VISp, SUB
## 9 TT, ORBm, PL, LSr, root, CA3, VISl, CA1, TH, VISam, VPL, LD
## 10 MB, VISp, SCm, SCsg, POST, DG, MRN, CA1, VISl, POL, root, GPe, VISrl
## 11 MOp, LSc, root, PT, CP, LSr
## 12 VISp, DG, SUB, LGd, PL, root, MOs, ACA, CA1, VISam, MD, LH
## 13 VISam, ZI, DG, CA1, LGd, MB, SCs, RN, MRN, SCm, ACA, PL, MS, root, MOs
## 14 ORB, MOs, root, MRN, SCm, SCs, VISp, RSP, CA1, PAG
## 15 BLA, GPe, root, VPM, LGd, ZI, MB, CA3
## 16 SSs, SSp, MB, TH, LGd, CA3
## 17 root, VPL, VPM, RT, MEA, LD
## 18 CP, ACB, OT, SI, SNr, LGd, ZI, CA3, root, TH
##
## -1 1
## 1473 3608
##
## 0 0.25 0.5 1
## 0 1371 194 326 454
## 0.25 179 99 179 317
## 0.5 397 166 111 163
## 1 438 423 159 105
Next, an analysis of neural activity was examed to better understand how the mean firing rate across trials in each session. The spike trains were accessed and their mean values were computed to represent the overall neural response during the trial. These visualizations show most of the neural firing rates were within 0.3 spiked per bin of each other. There is fluctuation that was pretty consistent across trials which could suggests there is a trial-to-trail variability; however, as they all stayed within a 0.3 difference between them, it could mean there is no long-term drift or sudden shift we should be worried about.
Additionally, there is not a very clear positive or negative trend in most of the graphs. This could suggests that there is no strong drift in the neural activity levels over time. This would mean that the animal performed consistently throughout the trials and mean that within the session, the conditions remained steady.
It is noted that each session had a different ranges in the average firing rate which could mean that the neural recording conditions and task parameters could have changed dramatically between sessions. Thus, this could affect data analysis; however, this could be expected as the sessions were not all conducted in the same day, which could lead to other variables affecting the data.